8 research outputs found
High-fidelity Pseudo-labels for Boosting Weakly-Supervised Segmentation
The task of image-level weakly-supervised semantic segmentation (WSSS) has
gained popularity in recent years, as it reduces the vast data annotation cost
for training segmentation models. The typical approach for WSSS involves
training an image classification network using global average pooling (GAP) on
convolutional feature maps. This enables the estimation of object locations
based on class activation maps (CAMs), which identify the importance of image
regions. The CAMs are then used to generate pseudo-labels, in the form of
segmentation masks, to supervise a segmentation model in the absence of
pixel-level ground truth. In case of the SEAM baseline, a previous work
proposed to improve CAM learning in two ways: (1) Importance sampling, which is
a substitute for GAP, and (2) the feature similarity loss, which utilizes a
heuristic that object contours almost exclusively align with color edges in
images. In this work, we propose a different probabilistic interpretation of
CAMs for these techniques, rendering the likelihood more appropriate than the
multinomial posterior. As a result, we propose an add-on method that can boost
essentially any previous WSSS method, improving both the region similarity and
contour quality of all implemented state-of-the-art baselines. This is
demonstrated on a wide variety of baselines on the PASCAL VOC dataset.
Experiments on the MS COCO dataset show that performance gains can also be
achieved in a large-scale setting. Our code is available at
https://github.com/arvijj/hfpl
End-to-end Reinforcement Learning for Online Coverage Path Planning in Unknown Environments
Coverage path planning is the problem of finding the shortest path that
covers the entire free space of a given confined area, with applications
ranging from robotic lawn mowing and vacuum cleaning, to demining and
search-and-rescue tasks. While offline methods can find provably complete, and
in some cases optimal, paths for known environments, their value is limited in
online scenarios where the environment is not known beforehand, especially in
the presence of non-static obstacles. We propose an end-to-end reinforcement
learning-based approach in continuous state and action space, for the online
coverage path planning problem that can handle unknown environments. We
construct the observation space from both global maps and local sensory inputs,
allowing the agent to plan a long-term path, and simultaneously act on
short-term obstacle detections. To account for large-scale environments, we
propose to use a multi-scale map input representation. Furthermore, we propose
a novel total variation reward term for eliminating thin strips of uncovered
space in the learned path. To validate the effectiveness of our approach, we
perform extensive experiments in simulation with a distance sensor, surpassing
the performance of a recent reinforcement learning-based approach
Camera-Based Friction Estimation with Deep Convolutional Neural Networks
During recent years, great progress has been made within the field of deep learning, and more specifically, within neural networks. Deep convolutional neural networks (CNN) have been especially successful within image processing in tasks such as image classification and object detection. Car manufacturers, amongst other actors, are starting to realize the potential of deep learning and have begun applying it to autonomous driving. This is not a simple task, and many challenges still lie ahead. A sub-problem, that needs to be solved, is a way of automatically determining the road conditions, including the friction. Since many modern cars are equipped with cameras these days, it is only natural to approach this problem with CNNs. This is what has been done in this thesis. First, a data set is gathered which consists of 37,000 labeled road images that are taken through the front window of a car. Second, CNNs are trained on this data set to classify the friction of a given road. Gathering road images and labeling them with the correct friction is a time consuming and difficult process, and requires human supervision. For this reason, experiments are made on a second data set, which consist of 54,000 simulated images. These images are captured from the racing game World Rally Championship 7 and are used in addition to the real images, to investigate what can be gained from this. Experiments conducted during this thesis show that CNNs are a good approach for the problem of estimating the road friction. The limiting factor, however, is the data set. Not only does the data set need to be much bigger, but it also has to include a much wider variety of driving conditions. Friction is a complex property and depends on many variables, and CNNs are only effective on the type of data that they have been trained on. For these reasons, new data has to be gather by actively seeking different driving conditions in order for this approach to be deployable in practice.Under de senaste Ären har det gjorts stora framsteg inom maskininlÀrning, sÀrskilt gÀllande neurala nÀtverk. Djupa neurala nÀrverk med faltningslager, eller faltningsnÀtverk (eng. convolutional neural network) har framför allt varit framgÄngsrika inom bildbehandling i problem sÄ som bildklassificering och objektdetektering. Biltillverkare, bland andra aktörer, har nu börjat att inse potentialen av maskininlÀrning och pÄbörjat dess tillÀmpning inom autonom körning. Detta Àr ingen enkel uppgift och mÄnga utmaningar finns fortfarande framöver. Ett delproblem som mÄste lösas Àr ett sÀtt att automatiskt avgöra vÀglaget, dÀr friktionen ingÄr. Eftersom mÄnga nya bilar Àr utrustade med kameror Àr det naturligt att försöka tackla detta problem med faltningsnÀtverk, vilket Àr varför detta har gjorts under detta examensarbete. Först samlar vi in en datamÀngd bestÄendes av 37 000 bilder tagna pÄ vÀgar genom framrutan av en bil. Dessa bilder kategoriseras efter friktionen pÄ vÀgen. Sedan trÀnar vi faltningsnÀtverk pÄ denna datamÀngd för att klassificera friktionen. Att samla in vÀgbilder och att kategorisera dessa Àr en tidskrÀvande och svÄr process och krÀver mÀnsklig övervakning. Av denna anledning utförs experiment pÄ en andra datamÀngd bestÄendes av 54 000 simulerade bilder. Dessa har blivit insamlade genom spelet World Rally Championship 7 dÀr syftet Àr att undersöka om prestandan pÄ nÀtverken kan ökas genom simulerat data och dÀrmed minska kravet pÄ storleken av den riktiga datamÀngden. De experiment som har utförts under examensarbetet visar pÄ att faltningsnÀtverk Àr ett bra tillvÀgagÄngssÀtt för att skatta vÀgfriktionen. Den begrÀnsande faktorn i det hÀr fallet Àr datamÀngden. DatamÀngden behöver inte bara vara större, men den mÄste framför allt tÀcka in ett bredare urval av vÀglag och vÀderförhÄllanden. Friktion Àr en komplex egenskap och beror pÄ mÄnga variabler, och faltningsnÀtverk Àr endast effektiva pÄ den typen av data som de har trÀnats pÄ. Av dessa anledningar behöver ny data samlas in genom att aktivt söka efter nya körförhÄllanden om detta tillvÀgagÄngssÀtt ska vara tillÀmpbart i praktiken
Camera-Based Friction Estimation with Deep Convolutional Neural Networks
During recent years, great progress has been made within the field of deep learning, and more specifically, within neural networks. Deep convolutional neural networks (CNN) have been especially successful within image processing in tasks such as image classification and object detection. Car manufacturers, amongst other actors, are starting to realize the potential of deep learning and have begun applying it to autonomous driving. This is not a simple task, and many challenges still lie ahead. A sub-problem, that needs to be solved, is a way of automatically determining the road conditions, including the friction. Since many modern cars are equipped with cameras these days, it is only natural to approach this problem with CNNs. This is what has been done in this thesis. First, a data set is gathered which consists of 37,000 labeled road images that are taken through the front window of a car. Second, CNNs are trained on this data set to classify the friction of a given road. Gathering road images and labeling them with the correct friction is a time consuming and difficult process, and requires human supervision. For this reason, experiments are made on a second data set, which consist of 54,000 simulated images. These images are captured from the racing game World Rally Championship 7 and are used in addition to the real images, to investigate what can be gained from this. Experiments conducted during this thesis show that CNNs are a good approach for the problem of estimating the road friction. The limiting factor, however, is the data set. Not only does the data set need to be much bigger, but it also has to include a much wider variety of driving conditions. Friction is a complex property and depends on many variables, and CNNs are only effective on the type of data that they have been trained on. For these reasons, new data has to be gather by actively seeking different driving conditions in order for this approach to be deployable in practice.Under de senaste Ären har det gjorts stora framsteg inom maskininlÀrning, sÀrskilt gÀllande neurala nÀtverk. Djupa neurala nÀrverk med faltningslager, eller faltningsnÀtverk (eng. convolutional neural network) har framför allt varit framgÄngsrika inom bildbehandling i problem sÄ som bildklassificering och objektdetektering. Biltillverkare, bland andra aktörer, har nu börjat att inse potentialen av maskininlÀrning och pÄbörjat dess tillÀmpning inom autonom körning. Detta Àr ingen enkel uppgift och mÄnga utmaningar finns fortfarande framöver. Ett delproblem som mÄste lösas Àr ett sÀtt att automatiskt avgöra vÀglaget, dÀr friktionen ingÄr. Eftersom mÄnga nya bilar Àr utrustade med kameror Àr det naturligt att försöka tackla detta problem med faltningsnÀtverk, vilket Àr varför detta har gjorts under detta examensarbete. Först samlar vi in en datamÀngd bestÄendes av 37 000 bilder tagna pÄ vÀgar genom framrutan av en bil. Dessa bilder kategoriseras efter friktionen pÄ vÀgen. Sedan trÀnar vi faltningsnÀtverk pÄ denna datamÀngd för att klassificera friktionen. Att samla in vÀgbilder och att kategorisera dessa Àr en tidskrÀvande och svÄr process och krÀver mÀnsklig övervakning. Av denna anledning utförs experiment pÄ en andra datamÀngd bestÄendes av 54 000 simulerade bilder. Dessa har blivit insamlade genom spelet World Rally Championship 7 dÀr syftet Àr att undersöka om prestandan pÄ nÀtverken kan ökas genom simulerat data och dÀrmed minska kravet pÄ storleken av den riktiga datamÀngden. De experiment som har utförts under examensarbetet visar pÄ att faltningsnÀtverk Àr ett bra tillvÀgagÄngssÀtt för att skatta vÀgfriktionen. Den begrÀnsande faktorn i det hÀr fallet Àr datamÀngden. DatamÀngden behöver inte bara vara större, men den mÄste framför allt tÀcka in ett bredare urval av vÀglag och vÀderförhÄllanden. Friktion Àr en komplex egenskap och beror pÄ mÄnga variabler, och faltningsnÀtverk Àr endast effektiva pÄ den typen av data som de har trÀnats pÄ. Av dessa anledningar behöver ny data samlas in genom att aktivt söka efter nya körförhÄllanden om detta tillvÀgagÄngssÀtt ska vara tillÀmpbart i praktiken
Camera-Based Friction Estimation with Deep Convolutional Neural Networks
During recent years, great progress has been made within the field of deep learning, and more specifically, within neural networks. Deep convolutional neural networks (CNN) have been especially successful within image processing in tasks such as image classification and object detection. Car manufacturers, amongst other actors, are starting to realize the potential of deep learning and have begun applying it to autonomous driving. This is not a simple task, and many challenges still lie ahead. A sub-problem, that needs to be solved, is a way of automatically determining the road conditions, including the friction. Since many modern cars are equipped with cameras these days, it is only natural to approach this problem with CNNs. This is what has been done in this thesis. First, a data set is gathered which consists of 37,000 labeled road images that are taken through the front window of a car. Second, CNNs are trained on this data set to classify the friction of a given road. Gathering road images and labeling them with the correct friction is a time consuming and difficult process, and requires human supervision. For this reason, experiments are made on a second data set, which consist of 54,000 simulated images. These images are captured from the racing game World Rally Championship 7 and are used in addition to the real images, to investigate what can be gained from this. Experiments conducted during this thesis show that CNNs are a good approach for the problem of estimating the road friction. The limiting factor, however, is the data set. Not only does the data set need to be much bigger, but it also has to include a much wider variety of driving conditions. Friction is a complex property and depends on many variables, and CNNs are only effective on the type of data that they have been trained on. For these reasons, new data has to be gather by actively seeking different driving conditions in order for this approach to be deployable in practice.Under de senaste Ären har det gjorts stora framsteg inom maskininlÀrning, sÀrskilt gÀllande neurala nÀtverk. Djupa neurala nÀrverk med faltningslager, eller faltningsnÀtverk (eng. convolutional neural network) har framför allt varit framgÄngsrika inom bildbehandling i problem sÄ som bildklassificering och objektdetektering. Biltillverkare, bland andra aktörer, har nu börjat att inse potentialen av maskininlÀrning och pÄbörjat dess tillÀmpning inom autonom körning. Detta Àr ingen enkel uppgift och mÄnga utmaningar finns fortfarande framöver. Ett delproblem som mÄste lösas Àr ett sÀtt att automatiskt avgöra vÀglaget, dÀr friktionen ingÄr. Eftersom mÄnga nya bilar Àr utrustade med kameror Àr det naturligt att försöka tackla detta problem med faltningsnÀtverk, vilket Àr varför detta har gjorts under detta examensarbete. Först samlar vi in en datamÀngd bestÄendes av 37 000 bilder tagna pÄ vÀgar genom framrutan av en bil. Dessa bilder kategoriseras efter friktionen pÄ vÀgen. Sedan trÀnar vi faltningsnÀtverk pÄ denna datamÀngd för att klassificera friktionen. Att samla in vÀgbilder och att kategorisera dessa Àr en tidskrÀvande och svÄr process och krÀver mÀnsklig övervakning. Av denna anledning utförs experiment pÄ en andra datamÀngd bestÄendes av 54 000 simulerade bilder. Dessa har blivit insamlade genom spelet World Rally Championship 7 dÀr syftet Àr att undersöka om prestandan pÄ nÀtverken kan ökas genom simulerat data och dÀrmed minska kravet pÄ storleken av den riktiga datamÀngden. De experiment som har utförts under examensarbetet visar pÄ att faltningsnÀtverk Àr ett bra tillvÀgagÄngssÀtt för att skatta vÀgfriktionen. Den begrÀnsande faktorn i det hÀr fallet Àr datamÀngden. DatamÀngden behöver inte bara vara större, men den mÄste framför allt tÀcka in ett bredare urval av vÀglag och vÀderförhÄllanden. Friktion Àr en komplex egenskap och beror pÄ mÄnga variabler, och faltningsnÀtverk Àr endast effektiva pÄ den typen av data som de har trÀnats pÄ. Av dessa anledningar behöver ny data samlas in genom att aktivt söka efter nya körförhÄllanden om detta tillvÀgagÄngssÀtt ska vara tillÀmpbart i praktiken
Importance Sampling CAMs for Weakly-Supervised Segmentation with Highly Accurate Contours
Classification networks have been used in weakly-supervised semantic
segmentation (WSSS) to segment objects by means of class activation maps
(CAMs). However, without pixel-level annotations, they are known to (1) mainly
focus on discriminative regions, and (2) to produce diffuse CAMs without
well-defined prediction contours. In this work, we alleviate both problems by
improving CAM learning. First, we incorporate importance sampling based on the
class-wise probability mass function induced by the CAMs to produce stochastic
image-level class predictions. This results in segmentations that cover a
larger extent of the objects, as shown in our empirical studies. Second, we
formulate a feature similarity loss term, which further improves the alignment
of predicted contours with edges in the image. Furthermore, we shed new light
onto the problem of WSSS by measuring the contour F-score as a complement to
the common area mIoU metric. We show that our method significantly outperforms
previous methods in terms of contour quality, while matching state-of-the-art
on region similarity.Comment: Additional experiments/result
Balanced Product of Experts for Long-Tailed Recognition
Many real-world recognition problems suffer from an imbalanced or long-tailed
label distribution. Those distributions make representation learning more
challenging due to limited generalization over the tail classes. If the test
distribution differs from the training distribution, e.g. uniform versus
long-tailed, the problem of the distribution shift needs to be addressed. To
this aim, recent works have extended softmax cross-entropy using margin
modifications, inspired by Bayes' theorem. In this paper, we generalize several
approaches with a Balanced Product of Experts (BalPoE), which combines a family
of models with different test-time target distributions to tackle the imbalance
in the data. The proposed experts are trained in a single stage, either jointly
or independently, and fused seamlessly into a BalPoE. We show that BalPoE is
Fisher consistent for minimizing the balanced error and perform extensive
experiments to validate the effectiveness of our approach. Finally, we
investigate the effect of Mixup in this setting, discovering that
regularization is a key ingredient for learning calibrated experts. Our
experiments show that a regularized BalPoE can perform remarkably well in test
accuracy and calibration metrics, leading to state-of-the-art results on
CIFAR-100-LT, ImageNet-LT, and iNaturalist-2018 datasets. The code will be made
publicly available upon paper acceptance.Comment: 19 pages, under revie